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ABSTRACT 

We determine the galaxy counts-in-cclls distribution from the Sloan Digital Sky Survey (SDSS) for 
3D spherical cells in redshift space as well as for 2D projected cells. We find that cosmic variance in 
the SDSS causes the counts-in-cells distributions in different quadrants to differ from each other by up 
to 20%. We also find that within this cosmic variance, the overall galaxy counts-in-cells? distribution 
agrees with both the gravitational quasi-equilibrium distribution and the negative binomial distri- 
bution. We also find that brighter galaxies are more strongly clustered than if they were randomly 
selected from a larger complete sample that includes galaxies of all luminosities. The results suggest 
that bright galaxies could be in dark matter haloes separated by less than ~ 10/i -1 Mpc. 
Subject headings: galaxies: statistics — cosmology: theory — large-scale structure of universe - 
gravitation 



1. INTRODUCTION 

The galaxy counts-in-cells distribution is a simple but 
powerful statistic which characterizes the locations of 
galaxies in space. It includes statistical information on 
voids and other underdense regions, on clusters of all 
shapes and sizes, on filaments, on the probability of find- 
ing an arbitrary number of neighbors around randomly 
located positions, on counts of galaxies in cells of arbi- 
trary shapes and sizes randomly located, and on galaxy 
correlation function s of all orders . These are just some o f 
its representations (Saslaw 2000; Saslaw & Yang 2010). 
Moreover it is also closely related to the distribution 
function of t he peculiar velocities of galaxies around th e 
Hubble flow ( |Saslaw et al.||1990| [Leong fe Saslaw||2004[ ). 

Although the counts-in-cells distribution contains a 
large amount of information about galaxy clustering, it 
has not received as much attention as more common sta- 
tistical descriptions of clustering such as the two-point 
correlation function. In addition, most earlier studies 
have focused on the counts-in-cclls distribution for a 



magnitude- li mited sample in projection (e.g. Sivakoff & 
Saslaw 2005 and references within). While there have 
been studies that have used redshift-limited samples (e.g . 

gr 



Saslaw & Haque-Copilah||1998j |Rahmani et al 



2009), 



their samples were generally smaller and their statistics 
were less precise. 

Other studies have also examined the void probability 
function, which is a special case of the counts-in-cells dis- 
tribution that describes the distribution of the volumes 
of voids, or regions with no galaxies. However, stud- 
ies ( Saslaw fc Hamilton|1984 Fry|1986 ) have shown that 
the void probability function can be entirely described by 
the volume integral of the two-point correlation function 
£ 2 and the mean number of galaxies in a cell N . This 
suggests that the void probability function alone is insuf- 
ficient to completely describe the clustering of galaxies. 
To do so we would have to consider more than just voids. 



Various statistical descriptions for the distribution 
function h ave been developed (for an early review see 
|Fry||1986[ ) with the gravitational quasi-equilib r um dis- 
tribution (GQED, |Saslaw fc Hamilton] |1984| |Ahmad 
et al.|2002|) and the negative binomial distribution (MBD 



Elizaldc & Gaztanag a|1992 



Whi le the GQED can be deri ved from thermodynam 



Shcth 1995) m common use. 



ics (ISaslaw fc Hamilton 1984) and statistical mechan 
ics ( |Ahmad et al.||2002p , the NBD has been shown to 
violate the secona law of thermodynamics by |Saslaw fc 



Fang| ( |1996 ). 

Observations however show a more complex picture. 
While the counts-in-cells distribution for the 2MASS 
catalog in projection shows a g ood agreement with the 
GQED (Sivakoff & Saslaw 2005), an analysis of the void 
probability function f or the SPSS an d DEE P2 catalogs 
in redshift space by Conroy et al. (20051 suggests a 
closer agreement with the MBD. This disagreement be- 
tween projection and redshift space complicates our un- 
derstanding of the theory behind the clustering of galax- 
ies and raises a number of questions. Why does the ob- 
served counts-in-cells distribution agree with the NBD 
in some cases and the GQED in others? What are the 
conditions under which the counts-in-cells distribution 
agrees more closely with the GQED or NBD? Moreover, 
should the universe be allowed to violate the second law 
of thermodynamics? 

In section 2 we describe the distribution functions and 
the information they contain. In particular, we describe 
the derivation and some aspects of the GQED and NBD. 
In section 3, we describe the procedure used to measure 
the counts in c ells distribution from the SDSS NYU- 
VAGC catalog (Blanton et al. 2005). In section 4 we 
present our results for the 2-pomt correlation function 
and f y(N) . In section 5 we summarize our findings. Fol- 
lowing Blanton et al. (20051, we use £l m — 0.3, flk = 0.0, 
rt A = 0.7 and H = lOO/i km s" 1 Mpc" 1 . 
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2. DISTRIBUTION FUNCTIONS 

The most general form of the counts-in-cells distribu- 
tion is denoted by f(N, V) which gives the probability 
of finding N galaxies in a region of space with volume 
V. There are two approaches to studying this distribu- 
tion. The first approach is to let V be constant resulting 
in fv(N) which gives the distribution of the number of 
galaxies N for cells of a given volume V. This method is 
simple to use, yet powerful. The measurement of fv(N) 
generally involves examining cells in 3D space or in pro- 
jection and counting the number of galaxies in each cell. 

In addition, the moments of /y(iV) are closely related 
to the volum e integrals of the correlation functions of all 
orders (e.g. |Peebles|p80l |F^|l985} |Saslaw ||2000t and 
ions can be measured ti 



the correlation functions 



from the mo- 



ments of fv(N) ( |Fry fc Gaztana ga 1994). For example 
the relation between the volume integrals of the 2-point 
and 3-point correlation functions and the moments of the 
counts-in-cells distribution are given by 
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where N is the mean number of galaxies in a cell and the 
volume integral of the Appoint correlation function 
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with £ x = 1 depends on the cell volume V . This prop- 
erty allows us to compare the counts-in-cells results with 
observations of the two-point correlation function. 

To get the measured value of the two-point correlation 
function £2(7) we rewrite equation ([3| for the 2-galaxy 
case as 

1 f r dV 



£ 2 (r) 



V(r) 



dr 



&(r)dr 



(4) 



This is a conditional average correlation where one 
galaxy is located at the center of the volume so one power 
of V in the denominator is removed by using polar co- 
ordinates relative to the central galaxy of the arbitrary 
volume. 

We can invert the integral using a finite difference 
scheme with an interval of Ar to approximate the value 
of £2(1") such that 



6(r) 



t 2 (r + Ar)V(r + Ar)-£ 2 (r)V(r) 



V(r + Ar) - V(r) 



where from equation ([I]) 
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and V is the volume of the cell which depends on the 
shape of the cell. This gives us a means of determin- 
ing the two-point correlation function from a series of 
measurements of /y(A') over a range of scales. 

The other approach to studying f(V,N) is to let N 
be constant resulting in fjy(V) which gives the distribu- 
tion of the volume V occupied by N galaxies of which 
the void probabilit y function (VPF), whe re N = 0, is 



a special case (e.g. Crane & Saslaw 1986). A theoret 



ical approach to /at (I/) is complicated by the fact that 



the distribution depends on the correlation function at 
all scales rather than a scale determined by a particular 
value of V. This scale dependence can be found either 
empirically from the dependence of the variance of the 
/y(AT) distribution on V, or from a model assumption 

of the form of £ 2 (V). These give the analytic form of 
Jn(V)- To avoid these complications, most attempts to 
study Jn(V) have focused on the VP F because use of the 
reduced void probability ( Fry|1986 ) considerably simpli- 
fies the analysis by expressing fo(V) in terms of N£ 2 - 
The reduced void probability, given by 



_ M/o(V)) 
N 



(7) 



provides a means of isolating the scale-dependence of the 
void probability function because \ is a function that de- 
pends only on N£ 2 , and N£ 2 is easily derived from the 
variance of fv(N). However, this simplification is only 
possible for voids in the GQED and NBD because, for 
N > 1, /n(V) depends on and £ 2 separately. More- 
over, the void distribution is relatively insensitive to in- 
formation on large cell sizes because large cells are un- 
likely to be completely empty. For these reasons we focus 
on the simpler fv{N) approach in this paper and in- 
troduce the statistical descriptions of the counts-in-cells 
distribution. 

2.1. The GQED 
The gravitational quasi-equilibri um distribution was 



first d erived from thermodynamics ( Saslaw fc Hami lton 
1984[) and subseq uently from statistical mechanics (Ah 



rnacTet al.||2002| by assuming that galaxy clustering 
evolves through a sequence of quasi-equilibrium states. 
The resulting distribution is given by 



Iv,GQEd(N) 



N(l - b) 



'N(l — b)+Nl 



,N-1 



-N(l-b)-Nb 

(8) 



where A^ = nV is the average expected number of galax- 
ies in a cell of volume V and n is the average number 
density of galaxies. Here b = —W/2K is the ratio of the 
gravitational correlation energy W to twice the kinetic 
energy K of peculiar velocities relative to the Hubble 
flow and it represents a measure of cluste ring. 



A p hysical description of b is given by Ahmad et al. 
(120021) to be 



3/2(Gm 2 ) 3 nT- 3 
1 + 3/2(Gm 2 ) 3 nT- 3 



(9) 



which relates b to the mass of a galaxy to, the number 
density of galaxies n and the kinetic temperature of the 
galaxies T. Here G is the gr avitational constant. Origi- 



nally an ansatz proposed by |Saslaw fc Hamilton 
the physical o rigin of b was on l y later understood 



(1984), 
hrougfi 

work done by |Saslaw fc Fang ( 1996 ) on the first and sec- 
ond laws of thermodynamics, and throug h the statisti- 
cal me chanical derivation of the GQED by |Ahmad et al. 
( [2002| . 

We can relate the clustering parameter b to the vari- 
ance of the counts-in-cells distribution through 
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which allows us to describe the clustering of galaxies with 
the GQED in a self-consistent manner with no free pa- 
rameters. This also allows us to relate b to the volume 
integral of the two-point correlation function such that 



b = i-(m 2 (v) + i) 



-1/2 



(11) 



which indicates that b depends on £ 2 and varies with cell 
volume V . 

Alth ough the derivation of equation Q by Ahmad 
et al. (2002) was done assuming t hat all galaxies" have 
the same mass, theoretical work by |Ahmad et al. (20061 
showed that the statistical mechanical framework can be 
extended to take into account population components 
of differing masse s. In addition, A^-body simulations by 



Itoh et al.| ([1993) also showed that the GQED for the 
case where galaxies are of the same mass is often a good 
fit to A^-body simulations where galaxies are allowed to 
take a range of masses. This suggests that the GQED 
given in equation |8| is a reasonable approximation to 
the counts-in-cells distribution. Together with the phys- 
ical motivation behind its derivation, the GQED can be 
used to gain further insights into the physics behind the 
counts-in-cells distribution. 

2.2. The NBD 

The negative binomial distribution was proposed i n the 
cosmological context by |Carru thers_fc Minh| ([1983D and 
subsequently derived by Elizaldc & Gaztanaga (1992) by 



subsequently derived by |Elizalde fc G aztanaga" J 1992 ) by 
describing the distribution as a statistical random pro- 
cess where A" galaxies are introduced in m spatially dis- 
connected boxes. In this model, the probability that a 
galaxy is introduced in a particular box is proportional 
to the number of galaxies already inside the box. The 
resulting distribution is 
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is a clustering parameter that depends on cell volume 
and r is the standard gamma function. Similar to the 
GQED, the NBD can also describe the counts-in-cells dis- 
tribution self-consistently with no free parameters, and 
the clustering parameter g is just £ 2 - 

An alternative der ivation of the NBD in the th ermody- 
na mic framewo rk of |Saslaw fc Hamilton| ( 1984 ) is given 
by |Sheth ( 1995 ). In this case, the equivalent ot b is given 

6=1 _M1+^!). (14) 

Although this form fulfils < b < 1, it was found to 
violate the second law of thermodynamics by Sa slaw fc] 
Fang ( 1996 ) which suggests that the NBD is not physi- 
cally motivated. A closer look at the statistical random 
process from which the NBD was derived suggests that 
the NBD assumes galaxies form where there is already 
a cluster of galaxies. This process does not take infall 



into account, and hence the depletion of regions outside 
a cluster that occur in the process of infall is not taken 

into account. 

From the d erivation of the NBD by |Elizalde fc Gaz-| 
|tanaga| ( 1992 1, we note that the NBD can describe the 
case where galaxies form from the merger of less massive 
objects. In this description, the less massive objects can 
be expected to follow the GQED, but not all of them can 
be observed. These objects may merge to form objects 
bright enough to be observed, and their locations are 
likely to be in denser regions that contain a higher den- 
sity o f faint er objects. A^-body simulations by |Conroy| 



et al. (2005) show that while the VPF for galaxies fol- 
lows the NBD, the VPF for dark matter particles follows 
the GQED. While this qualitative explanation may seem 
plausible, a detailed quantitative analysis will depend on 
the physics of the more complicated halo occupation dis- 
tribution. 

3. DATA AND PROCEDURE 
3.1. Catalog data 
The New York University value-add ed galaxy cata- 



log (NYU-VAGC, |Blanton et lu~1|2005| ) is a composite 
catalog with the Sloan Digital Sky Survey (SDSS) data as 
its primary component. It contains over 550,000 galaxies 
with their redshifts and positions on the sky. The cat- 
alog also contains extinction corrected and AT-corrected 
absolute magnitudes for 8 bands, of which the u, g, r, i 
and z bands come from the SDSS and the J, H and K s 
bands come from the 2-Micron All-Sky Survey (2MASS) 
although for this study we use only the data from the 
SDSS. The galaxies in the catalog are also corrected for 
fiber collisions using the "nearest" method described in 
Less that 10% of the galaxies are 



Blanton et al. 



(2005). 



affected by tnis correction which allows for a more com- 
plete sample in crowded regions. 

In addition to the galaxy catalog, the NYU-VAGC also 
contains a survey geometry catalog that describes the 
su rvey footprint in term s of spherical polygons (described 
in |Blanton et al.||2005 ). Since the SDSS is not an all-sky 
survey, trie survey footprint determines the positions of 
cells and allows us to lay down cells where there is valid 
data. 

For this work, we use the large scale structure samples 
in the version of the catalo g corresponding to the s eventh 
data release of the SDSS (lAbazajian et al.||2009[ DR7). 
We use the subsample with a flux limit ot r < 17.6 and 
perform further selection cuts based on the properties of 
the sample. In particular, we choose absolute magnitude 
cuts to obtain a complete sample within a given redshift 
range. 

We consider two redshift ranges in the g, r and i bands 
at 0.04 < z < 0.12 and 0.12 < z < 0.20. The low red- 
shift limit of z > 0.04 ensures that the sample is within 
the Hubble flow, and excludes the Coma and Virgo clus- 
ters. Since the SDSS "great wall" spans a redshift range 
of 0.065 < z < 0.09 ( |Gott et al.||2005|), it is fully con- 
tained within the low redshift range. This allows us to 
isolate the effect of the "great wall" by comparing the 
low redshift range to the high redshift range. 

To determine a suitable absolute magnitude cut, we de- 
fine the faint limit Mf as the absolute magnitude where 
the observed luminosity function begins to turn over be- 
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Selected subsamples 



-21 



-23 



-22 -21 

M - 5 log(fc-) 



-20 



-19 



Figure 1. Observed luminosity function for the NYU-VAGC at 
0.04 < z < 0.12 (top panel) and 0.12 < z < 0.20 (bottom panel). 
The vertical lines indicate the absolute magnitude cuts we have 
adopted. 

cause the limiting magnitude has been reached. This 
means that for a faint limit Mf and limiting redshift 
Zmaxi the comoving number density of galaxies n(Mf) 
brighter than Mf should be the approximately the same 
for any limiting redshift z < z max . 

We obtain Mf by comparing n(Mf) for a redshift range 
with a lower redshift subset of the same range. For the 
low range, we compare the range 0.04 < z < 0.12 with 
the range 0.04 < z < 0.10 and for the high redshift range 
we compare the range 0.12 < z < 0.20 with the range 
0.12 < z < 0.18. The optimal faint limit Mf which 
gives us the largest complete sample occurs where the 
compared values of n{Mf) are approximately equal. 

We find that the lower redshift range is complete for 
M g < -19.5, M r < -20.2 and M, < -20.6 while 
the higher redshift range is complete for M g < —20.7, 
M r < -21.5 and Mi < -21.9. We plot these limits on 
the observed luminosity function at 0.04 < z < 0.12 and 
0.12 < z < 0.20 in figure [T] and summarize the subsam- 
ples we use in table [T] 

Here we note that the la(g), la(r) and la(i) samples 
have similar spatial densities, and likewise the lb(g), 
lb(r) and lb(i), and 2b(g), 2b(r) and 2b(i) samples also 
have similar spatial densities. Hence any large differ- 
ences in clustering between the samples of different color 
should arise from selection effects that depend on color. 

3.2. Cosmic Variance 



An analysis by Sylos Labini et al. ( 2009 ) found system- 
atic variations between different subvolumes of the SDSS 
catalog on scales larger than 30/i _1 Mpc such that these 
subvolumes are not statistically similar. These variations 
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4.65 X 10" 
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4.41 X 10" 
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2.38 x 10- 
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are likely to be caused by cosmic variance that our anal- 
ysis should take into account. We use two approaches to 
analyze the effect of cosmic variance on our results. 

The first is simply to consider independent sub-fields of 
the survey footprint. To ensure that effects of galactic 
latitude, distance and lookback time are constant across 
all subsamples, we compute and compare the counts-in- 
cells distribution for cells in non-overlapping quadrants 
in galactic longitude. The size of quadrants is also com- 
parable with the size of the SDSS "great wall" which 
spans ~ 70° ( |Deng et al.|2006[ ). In this approach, all cells 
that belong to a quadrant are fully-contained within the 
selected quadrant. This gives us a picture of the varia- 
tions among widely separated areas of the sky. We con- 
sider subsamples in four quadrants such that for galactic 
longitude I, quadrant 1 (ql) covers 0.0° < I < 90.0°, 
quadrant 2 (q2) covers 90.0° < I < 180.0°, quadrant 
3 (q3) covers 180.0° < I < 270.0° and quadrant 4 (q4) 
covers 270.0° < I < 360.0°. For some samples, we also 
examine the quadrant to quadrant variations where the 
quadrant boundaries boundaries have been shifted by 
30°, 45° and 60° in galactic longitude to check that any 
variation between quadrants is not caused by our choice 
of quadrant boundaries. 

Moreover, we have varied the size of subsamples and 
found that for smaller subsamples, e.g. sixths rather 
than quadrants, the subsample to subsample variations 
are smaller than for larger subsamples. However, sub- 
samples that are too large will not be independent, and 
there will be too few of them to provide an accurate es- 
timate of the cosmic variance between disconnected re- 
gions of the sky. Therefore quadrants are a reasonable 
subsample size to use. 

The second approach is a jackknife-style approach 
where we leave out cells that fall within a region of the 
sky, selected based on a quasi-random sequence from 



Bratley et al. (1994 



vey region is equally 



such that each part of the sur- 
ikely to be chosen for exclusion. 
For our analysis, we use 1000 different exclusion regions 
which are circular with a radius of 15°. This corresponds 
to a transverse distance of about 60/i^ 1 Mpc at z ~ 0.04 
and an area of approximately 10% of the SDSS footprint 
for a "leave 10% out" jackknife procedure from which we 



1 Implemented in the GNU Scientific Library (http://www.gnu. 
org/sof tware/gsl/ ) 
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can determine the 1-er error. 

Here we wish to stress that the jackknife errors are only 
valid in the case where the SDSS catalog is a represen- 
tative sample of the universe. This condition essentially 
requires the universe to be statistically homogenous at 
scales larger than the SDSS footprint. If this requirement 
is not met, the errors will not be meaningful because the 
sample is not a representative sample of the universe. 

3.3. Counts-in- cells strategy 

To obtain the counts-in-cells distribution, we take a 
sample of cells whose positions are evenly distributed 
over the survey footprint. To sample the galaxies effi- 
ciently, we use a number of cells approximately equal to 
the number of galaxies, so based on table [l] we use a 
sample of approximately 130, 000 ~ 140, 000 cells. 

To ensure that the entire survey footprint is sampled 
without bias to any particular region of space, we use the 
following procedure. We first define an "instance" as a 
set of cells that tile the entire survey region with a con- 
sistent amount of overlap. For small cell sizes, a single 
instance provides enough cells for reliable statistics. For 
larger cell sizes, we use multiple instances for efficient 
sampling. To do so, we displace the origin of eac h subse- 
quent instan ce by a quasi-random sequence from |Bratley| 



et al.| (I1994J1 1 ] so that cells from no two instances exactly 
line up with each other. Because we may be dealing 
with cells on small scales and because cells are allowed 
to overlap, adjacent cells are generally not independent 
and hence we cannot use statistical tests which assume 
samples are independent. 

Since the SDSS is not an all-sky survey, we also check 
each cell against the survey coverage area. We express 
the projected extent of each cell in terms of spherical 
polygons and check these against the survey geometry 
of t he NYU-VAGC. We use the mangle software pa ck- 



age ( |Hamilton fc Tegmark||2004| |Swanson et al.|2008[ ) to 

combine the survey geometry and the bright star mask 
into a combined exclusion mask that removes areas that 
are either not in the SDSS or are obscured by a fore- 
ground object. In addition, we also discard regions with 
a galactic latitude below 20° to further minimize fore- 
ground contamination from the galaxy. 

To accept or reject a cell, we use MANGLE to compute 
the overlap between the combined exclusion mask and 
the cell. Since cells may be partly obscured by foreground 
objects even if they are well within the contiguous region 
of the survey, we accept cells that have less than 5% of 
their area masked out. This allows cells to have small 
regions that may be obscured by foreground objects while 
having a minor effect of less than 5% on our statistics. 

The counts-in-cells distribution fy (N) is then obtained 
by taking the histogram of the number of galaxies within 
a cell. For this study, we consider 2-dimensional circular 
cells projected on the sky and 3-dimensional spherical 
cells in redshift space. 

3.3.1. 2-dimensional cells projected on the sky 

For the study of the 2-dimensional projected cells, we 
use circular cells because their areas and membership 
are simple to calculate. Such cells can be represented by 
only two parameters, the cell radius 9 and the position 
of the cell center. This allows the cell to be described 



as a spherical polygon with only one cap which is easily 
processed by mangle. The area of a circular cell in 
steradians is given by 2n(l — cos 9), and a galaxy is a 
member of a cell if the great circle distance between the 
galaxy and the cell center is less than 9. 

With a redshift limited sample defined such that the 
redshift z falls in the range Z\ < z < z%, we can also 
determine the comoving volume of the cell with angular 
radius 9 using 



V(9; Zl ,z 2 ) = 



2(1 - cos( 



[D(z 2 f - D(z 1 ) i 



(15) 



where the comoving distance D(z) is given by integrating 
the Friedmann equation 



D(z) 



(n m (i + z'f + n fc (i + z'f + n A ) 



-1/2 



dz'. 



(16) 

To obtain the 2D projected counts-in-cells distribution, 
we first map the celestial sphere onto an equal-area sinu- 
soidal projection using 



xq — a cos(d) 
X\ = S 



(17) 



where a and 5 refer to the J2000.0 right ascension and 
declination respectively. For each instance, we place cell 
centers on a square grid overlaid on this projection at 
intervals of \[29. Subsequent projections will have xo 
and x\ shuffled by an amount less than \/29. For the 2D 
sample we consider cells with radii between 0.05° and 
6.0° in steps of 0.05°. 

3.3.2. 3- dimensional cells in redshift space 

For 3-dimensional cells in redshift space, we use spher- 
ical cells because they are simple to analyse. For exam- 
ple, the simplest form of equation ^ applies to spherical 
cells, and such cells can be described by just their loca- 
tion and radius r. 

To obtain the 3D counts-in-cells distribution in red- 
shift space, we first co nvert redshift space i nto Cartesian 
coordinates using (c.f. |Blanton et aL][2005| 

x = D cos S cos a 
x\=D cos S sin a 

x 2 =Dshx6 (18) 

where a and 5 refer to the J2000.0 right ascension and 
declination respect ively , and D is the comoving distance 
given by equation (16 1. For each instance, we place cell 
centers on a cartesian grid with spacings of \/2r. Subse- 
quent instances will have the origin of the grid shuffled 
by an amount less than y/2r. For the 3D sample we con- 
sider cells with radii between 2.0/i _1 Mpc and 36.0/i _1 
Mpc in steps of O^h,- 1 Mpc. 

Since we work in comoving coordinates, the resulting 
projected area of a cell is a circle about the cell center of 
angular radius 9 = sin - (r/-D) where r is the radius of 
the cell in comoving coordinates. The cell center is also 
easily obtained from xq, X\ and x%. Hence, we can define 
a spherical polygon that represents the footprint of the 
cell on the sky in a manner similar to what we have used 
for the case of 2D cells. 



G 



To get the positions of galaxies in redshift space, we 
apply the transformation in equation ( 18 ) . Then a galaxy 



is a member of a cell if the distance between the galaxy 
and cell center is less than r. 

4. RESULTS 
4.1. The Two-Point Correlation Function 

Since the two-point correlation function is a well- 
studied description of clustering, we first compute the 
two-point correlation functions from the counts-in- 
cells distribution using equation Q and compare our re- 
sults with earlier works. This allows us to check the va- 
lidity of our data and method by comparing our results 
to results from previous studies. 

For this study, we focus on the power law approxima- 
tion of the two-point correlation function since we are 
dealing with small scales. The power law approxima- 
tion of the two-point correlatio n function at small scales 
is (e.g. Totsuji fc Kihara||1969[ ) 



£2,30 (r) 



ro 



(19) 



Table 2 

Two-point correlation function £2 2D (&) for 2D cells 



Sample 



la(g) 
la(g) 
la(g) 
la(g) 
lafe) 

la(r) 
la(r) 
la(r) 
la(r) 
la(r) 

la(i) 
la(i) 
la(i) 
la(i) 
la(i) 



Quadrant 



ft>(°) 



All 

0° < I < 90° 
90° < Z < 180° 
180° < I < 270° 
270° < I < 360° 



0' 



All 

< I < 



90° 



90° < I < 180° 
180° < I < 270° 
270° < I < 360° 



0' 



All 

< I < 



90° 



90° < I < 180° 
180° < I < 270° 
270° < I < 360° 



K).002 
-0.001 
046+ 006 

U.U1D_ 004 

064+ - 007 

U.UD1_ Q 003 
n n57 +0.007 
u ' uo ' -0.004 
+0.008 
0.006 



0.053^ 



0.056 



0.066^ 



-+0.002 
-0.001 
063+ - 007 

U.UDO_ Q Q06 

0.0751°;°™ 
0.065l°± 



0.069 



0.004 
+0.007 
0.007 



0.068^ 



2+0.002 
-0.001 
066+ - 007 

n n7R+ - 009 
u.u/»_ 002 

O.Ub8_ 004 

-+0.007 



0.074~ 



0.006 
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1.68 
1.56 
1.78 
1.81 
1.79 

1.71 
1.59 
1.80 
1.83 
1.82 

1.72 
1.60 
1.81 
1.84 
1.83 



+0.02 
0.02 

+0.04 
0.02 
+0.09 
0.05 
+0.08 
0.05 
+0.15 
0.11 

+0.02 
0.02 
+0.04 
0.02 
+0.09 
0.05 
+0.08 
0.05 
+0.15 
0.11 

+0.02 
0.02 
+0.04 
0.02 
+0.09 
0.04 
+0.08 
0.05 
+0.15 
0.11 



for 3D cells and 



£2,21) (#) = 



-7+1 



(20) 



for 2D cells. We obtain the parameters ro and 7 by fitting 

a linear relation between log feff)) and log(r). 

Since previous studies (e.g. |Hawkins et al.|[2003 Con 



nolly et aT"||2002 ) have shown that the two-point correla- 



tion function deviates from a power law at large scales, 
we use datapoints from scales smaller than IO.O/1 -1 Mpc 
for 3D cells or scales smaller than 1.25° for 2D cells to 
determine a power law fit. Because &s(r) depends on 
the gradient of V^ 2 { r )^ we use a 5-point moving average 
of V^ 2 ( r ) to obtain the overall shape of V£o(r) for our 
analysis. We summarize our results in tables[2] and [3] 

We find that the value of 7 for the 3D two-point cor- 
relation function is about 1.5 ~ 1.6 and £2,3.0 M begins 
to deviate from a power law at scales of r = 11 ~ 12h 
Mpc, in agr eement with earlier work such as |Hawkins| 



12003). The value of 7 for the 2D two-point corre- 
function is about 1.7 ~ 1.8 and is close to the value 



etaL 
lation 

obtained by Connolly et al. ( 2002 ). We note that £2,2.0 (#) 
shows a break from the power law fit at 9 ss 1.75° for the 
la and 2b samples, and 9 ss 2.4° ~ 2.6° for the lb sam- 
ples. 

To compare the 2D and 3D samples, we first note that 
the 2D samples measure the projected correlation func- 
tion while the 3D samples measure the redshift space cor- 
relation function. The 3D samples are affected by distor- 
tions in redshift space caused by peculiar velocities while 
the 2D samples, which ignore the detailed distance in- 
formation, are not affected by redshift space distortions. 
Therefore the comparison between 2D and 3D samples 
c an help quantify th e effec t of t hese distortions. 



Totsuji fc Kihara| ([1969]) and Davis fc Peebles (U983I 
relate the projected correlation function £2,20 (?>] to the 
real space corelation function ^(^Yeai) by 



lb(g) 




All 




0.140 


lb(g) 


0° 


< I < 


90° 


0.159 


lb(g) 


90° 


< I < 


180° 


0.174 


lb(g) 


180° 


< I < 


270° 


0.113 


lb(g) 


270° 


< I < 


360° 


0.113 


lb(r) 




All 




0.169 


lb(r) 


0° 


< I < 


90° 


0.172 


lb(r) 


90° 


< I < 


180° 


0.212 


lb(r) 


180° 


< I < 


270° 


0.141 


lb(r) 


270° 


< I < 


360° 


0.148 


lb(i) 




All 




0.168 


lb(i) 


0° 


< I < 


90° 


0.173 


lb(i) 


90° 


< I < 


180° 


0.200 


lb(i) 


180° 


< I < 


270° 


0.147 


lb(i) 


270° 


< I < 


360° 


0.146 



2b(g) 
2b(g) 
2b(g) 
2b(g) 
2b(g) 

2b(r) 
2b(r) 
2b(r) 
2b(r) 
2b(r) 

2b(i) 
2b(i) 
2b(i) 
2b(i) 
2b(i) 



0' 



All 

< / < 



90° 



90° < I < 180° 
180° < / < 270° 
270° < I < 360° 



0' 



All 

< / < 



90° 



90° < / < 180° 
180° < / < 270° 
270° < I < 360° 



0° 



All 

< I < 90° 
90° < / < 180° 
180° < I < 270° 
270° < I < 360° 



+0.005 
-0.002 
+0.025 
-0.015 
+0.034 
-0.005 
+0.005 
-0.016 
+0.023 
-0.015 

+0.005 
-0.003 
+0.032 
-0.020 
+0.030 
-0.005 
+0.007 
-0.019 
+0.026 
-0.013 

+0.005 
-0.003 
+0.026 
-0.015 
+0.036 
-0.007 
+0.008 
-0.018 
+0.020 
-0.012 



0.051 
0.052 
0.051 
0.044 
0.070 



+0.004 
0.002 
+0.004 
0.008 
+0.012 
0.006 
+0.016 
0.005 
+0.017 
0.011 



0.063^ 



,+0.004 
3 -0.002 

060+ - 004 
u - UDU -0.006 

063+ 019 

U.UDCi_ 00g 

057+ - 016 
U-UOl -0.010 
-,+0.019 



0.080" 



0.009 



-+0.004 
-0.002 
3+0.004 
-0.007 

n nfi«+ - 019 
0.0b8_ 007 

2+0.017 
-0.009 

0.088ir f 2 



0.066Z 

o.oes 1 * 



0.058^ 



1.74 
1.62 
1.95 
1.81 
1.75 

1.76 
1.61 
2.07 
1.87 
1.73 



+0.02 
0.02 

+0.07 
0.02 
+0.11 
0.06 
+0.04 
0.06 
+0.17 
0.13 

+0.02 
0.03 
+0.04 
0.02 
+0.08 
0.05 
+0.04 
0.08 
+0.18 
0.14 



s +0.02 
-0.02 
«+0.04 
-0.02 
2+0.09 
-0.05 
-,+0.05 
-0.08 
1 7 o+0.17 
'-■ ,o -0.14 



1.78] 
1.66j 
1.98j 

Lgo 1 " 



1 

1.92 
1.92 
1.70 
2.05 

1.86 
1.95 
1.94 
1.72 
2.02 

1.87 
1.96 
1.93 
1.73 
2.06 



+0.05 
-0.02 
+0.07 
0.07 
+0.14 
0.07 
+0.24 
0.05 
+0.15 
0.09 

+0.05 
0.03 
+0.06 
0.06 
+0.20 
0.09 
+0.19 
0.06 
+0.11 
0.03 

+0.05 
0.03 
+0.06 
0.07 
+0.20 
0.08 
+0.19 
0.06 
+0.12 
-0.07 



6( r reai) 



2, 2D 



.-1/2 



dr„ 



dr p (21) 
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Table 3 

Two-point correlation function £2 3D ( r ) f° r 3D cells 



Sample 



Quadrant 



ro(/i _1 Mpc) 







All 




0.D4 


la(g) 


0° 


< 1 < 


90° 


5.74 


la(g) 


90° 


< 1 < 


180° 


5.72 


la(g) 


180° 


< 1 < 


270° 


5.27 


la(g) 


270° 


< 1 < 


360° 


6.14 


la(r) 




All 




5.95 


la(r) 


0° 


< I < 


90° 


6.07 


la(r) 


90° 


< I < 


180° 


6.09 




180° 


< I < 


270° 


5.48 




270° 


< I < 


360° 


6.57 


la(i) 




All 




6.05 


la(i) 


0° 


< I < 


90° 


6.19 


la(i) 


90° 


< I < 


180° 


6.16 


lafi) 


180° 


< I < 


270° 


5.55 




270° 


< I < 


360° 


6.73 


lt>(,gj 




All 




I . to 


lb(g) 


0° 


< I < 


90° 


8.19 


lb(g) 


90° 


< I < 


180° 


7.19 


lb(g) 


180° 


< I < 


270° 


7.25 


lb(g) 


270° 


< I < 


360° 


7.75 


lb(r) 




All 




8.44 


lb(r) 


0° 


< I < 


90° 


8.76 


lbfr] 


90° 


< I < 


180° 


8.28 


Mr) 


180° 


< I < 


270° 


7.44 


lb(r) 


270° 


< I < 


360° 


8.85 


ib(i) 




All 




8.44 


ib(i) 


0° 


< I < 


90° 


8.38 


lb(i) 


90° 


< I < 


180° 


8.38 


lb(T) 


180° 


< I < 


270° 


7.70 


lbfil 


270° 


< I < 


360° 


9.09 


2b (g) 




A 1 1 

All 




7.23 


2b(g) 


0° 


< I < 


90° 


7.07 


2b(g) 


90° 


< I < 


180° 


7.02 


2b(g) 


180° 


< I < 


270° 


7.32 


2b(g) 


270° 


< I < 


360° 


7.34 


2b(r) 




All 




7.83 


2b(r) 


0° 


< I < 


90° 


7.60 


2b(r) 


90° 


< I < 


180° 


7.71 


2b(r) 


180° 


< I < 


270° 


7.89 


2b(r) 


270° 


< I < 


360° 


7.94 


2b(i) 




All 




7.98 


2b(i) 


0° 


< I < 


90° 


7.73 


2b(i) 


90° 


< I < 


180° 


7.92 


2b(i) 


180° 


< I < 


270° 


8.02 


2b(i) 


270° 


< I < 


360° 


8.14 



+0.05 
-0.02 
+0.24 
-0.12 
,+0.17 
-0.32 
+0.04 
-0.12 
+0.51 
■0.41 



+0.05 
-0.03 
+0.24 
-0.12 
+0.18 
-0.38 
+0.04 
-0.13 
+0.55 
-0.45 

+0.06 
-0.03 
+0.25 
-0.13 
+0.19 
-0.39 
+0.04 
-0.14 
+0.56 
-0.45 



+0.12 
-0.10 
+ 0.68 
-0.45 
+0.47 
-0.64 
+0.17 
-0.54 
+0.56 
-0.83 

+0.13 
-0.10 
+0.84 
-0.47 
+0.67 
-0.88 
+0.19 
-0.40 
+ 1.17 
-0.72 

+0.12 
-0.10 
+0.76 
-0.42 
+0.53 
-0.73 
+0.28 
-0.48 
+ 1.21 
-0.75 



+0.06 
-0.05 
+0.22 
-0.22 
+ 0.09 
-0.10 
+0.10 
-0.18 
+0.71 
-0.61 

+0.06 
-0.06 
+0.23 
-0.23 
+0.13 
-0.16 
+0.08 
-0.19 
+0.76 
-0.76 

+0.06 
-0.06 
+0.23 
-0.22 
+0.13 
-0.16 
+0.06 
-0.18 
+0.86 
-0.85 
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1.51 

1.53 
1.49 
1.53 
1.37 

1.51 
1.54 
1.49 
1.54 
1.38 

1.51 
1.53 
1.50 
1.54 
1.38 



+0.01 
0.01 

+0.04 
0.07 
+0.12 
0.05 
+0.06 
0.02 
+0.09 
0.12 

+0.01 
0.01 
+0.04 
0.07 
+0.12 
0.05 
+0.06 
0.02 
+0.09 
0.10 

+0.01 
0.01 
+0.03 
0.07 
+0.13 
0.04 
+0.06 
0.02 
+0.09 
0.10 



1 cm +0.02 
^^-O.Ol 
1 47 +0.09 
^'-O.IO 
1 63+ 027 
, (-,+0.10 
1 ' oz -0.04 
1 no+0.11 
1.3»_ n 



1.57 
1.53 
1.61 
1.63 
1.49 

1.59 
1.60 
1.63 
1.60 
1.48 



+0.01 
0.01 

+0.06 
0.10 
+0.16 
0.08 
+0.07 
0.03 
+0.16 
0.22 



+0.01 
0.01 
+0.07 
0.11 
+0.13 
0.07 
+0.07 
0.03 
+0.14 
0.23 



1.53 

1.53 
1.60 
1.51 
1.53 



+0.01 
0.01 

+0.04 
0.04 
+0.02 
0.01 
+0.04 
0.01 
+0.11 
0.08 



1.54" 1 



^+0.01 

-0.01 
1.64+0.0* 



-0.04 
3+0.02 
-0.01 

1.55+ - 05 



l^g-* 



1.53 

1.55 
1.54 

1.58 
1.56 
1.53 



0.01 
+0.12 

0.08 

+0.01 
0.01 
+0.03 
0.04 
+0.02 

0.01 
+ 0.05 

0.01 
+0.16 
0.11 



is the real space position, r r is the radial 
position along the line of sight and r p is the position in 
the direction perpendicular to the line of sight. Because 
we write the projected two-point corre latio n function in 
terms of angles on the sky, equation ( plj ) does not di- 
rectly apply to the comparison between the 2D and 3D 
samples in this study without a conversion factor be- 
tween 9 and r P . However, a deta iled analysis of equation 



((21} by |Totsuji fc Kihara| ( p69l ) shows that for a power 
law where £,2,2d{&) <x # _7+i , the real space correlation 
function follows ^(freai) oc r~2aV This means that the 
value of 7 for the 2D projected samples follows the value 
of 7 for the correlation function in real space. Hence we 
can simply compare the value of 7 between the 2D pro- 
jected samples and 3D redshift space samples to quantify 
the redshift space distortions. 

Comparisons of tables [2] and [3] shows that the value of 
7 for the 2D samples (projection) is consistently higher 
than the value of 7 for the 3D samples (redshift space) 
with a difference of about 0.2 ~ 0.3, agr eeing with ear- 



lier work by Fry & Gaztanaga (1994) and |Hawkins et ah 
(20031. These agreements witn earlier work show that 
the counts-in-cells method used in this study correctly 
describes galaxy clustering and distortions in redshift 
space. However, because distortions in redshift space 
will affect an analysis of large-scale structure using the 
3D samples, we note that results from the 3D samples 
may be less reliable than the results from the 2D sam- 
ples. 

Comparing samples with different color selection cri- 
teria, we note that there is generally no significant dif- 
ference in the value of 7 between samples selected using 
different colors within the same redshift range although 
in some cases there is a difference in r and #0 which can 
be attributed to the fact that samples selected using dif- 
ferent colors have slightly different spatial densities (c.f. 
table m. This means that samples selected using differ- 
ent colors cluster similarly. For this reason, we will focus 
our subsequent analysis on the r-band selected samples 
because they have the highest spatial density. We illus- 
trate the comparison between the selection criteria for 
different colors in figure [2] and note in particular that the 
slopes of the power law fits for samples selected using 
different colors are close to each other. 

Comparing the power law fit parameters r , 8 Q and 7 
between quadrants, we note the presence of differences 
that are significant at the 1-a level. In particular, these 
differences are consistent across different color and mag- 
nitude cuts but not across redshift ranges. For example, 
the value of 7 in the q2 subsample for 2D cells is signifi- 
cantly lower than the other quadrants for the low redshift 
range, but this is not the case in the high redshift range. 
We see similar behavior for the q3 subsample in the 2D 
high redshift range, and for the q4 subsample in the 3D 
low redshift range. For more insight into the differences 
between quadrants, we look at the counts-in-cells distri- 
bution f v (N). 

4.2. Counts-in-cells distribution fv(N) 

Using measurements of the two-point correlation func- 
tion, we can define 3 regimes to examine in detail. Since 
the two-point correlation function exhibits a break at 
about 12/1" 1 Mpc for 3D cells and at about 2° for 2D 
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3 




0.04 < z < 0.12 

- Ah < -20.6 

- M r < -20.2 
Mo < -19.5 




0.1 



1 

Projected cell radius 6 (°) 



0.04 < z < 0.12 

- Mi < -20.6 

- M r < -20.2 
M„ < -19.5 



10 

Spherical cell radius r Mpc) 



















0.04 < z < 0.12 




Mi < 21.9 




M r < 21.5 




M a < 20.7 





0.1 















0.04 < 2 < 0.12 




Mi < 21.9 




■ M r < -21.5 




M a < 20.7 





1 



Projected cell radius 9 (°) 



10 



Spherical cell radius r (h 1 Mpc 




0.12 < z < 0.20 

- Mi < -21.9 

- M r < -21.5 




0.1 



1 



Projected cell radius 8 (°) 



0.12 < z < 0.20 

- Mi < -21.9 

- M r < -21.5 
■ M q < -20.7 



10 



Spherical cell radius r (h 1 Mpc) 



Figure 2. Two-point correlation functions of different samples. Top left: 2D cells, la samples. Middle left: 2D cells, lb samples. Bottom 
left: 2D cells, 2b samples. Top right: 3D cells, la samples. Middle right: 3D cells, lb samples. Bottom right: 3D cells, 2b samples. 
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Table 4 

Cells used for counts-in-cells analysis 



Size 


Redshift 


Comoving Volume 






(10 4 h -3 Mpc) 




en 


2D Projected Cells 




1.00 


0.04 < 2 < 0.12 


1.31 


2.00 


0.04 < z < 0.12 


5.25 


4.00 


0.04 < z < 0.12 


20.98 


1.00 


0.12 < 2 < 0.20 


4.60 


2.00 


0.12 < 2 < 0.20 


18.38 


4.00 


0.12 < 2 < 0.20 


73.49 


r (h~ L Mpc) 


3D Spherical Cells 




6.0 


0.04 < 2 < 0.12 


0.090 


12.0 


0.04 < 2 < 0.12 


0.724 


24.0 


0.04 < 2 < 0.12 


5.79 


6.0 


0.12 < 2 < 0.20 


0.090 


12.0 


0.12 < 2 < 0.20 


0.724 


24.0 


0.12 < 2 < 0.20 


5.79 



cells, we look at jy(iV) for the cell sizes of 6.0ft- 1 Mpc, 
12.0ft -1 Mpc and 24.0ft" 1 Mpc for 3D cells, and 1.00°, 
2.00° and 4.00° for 2D cells. Because the analysis of 
the two-point correlation has shown samples selected us- 
ing different colors are similarly clustered, we focus our 
subsequent analysis of the counts-in-cells distribution on 
the r-band samples because they have the highest spa- 
tial density for a given redshift and magnitude cut. We 
summarize the cells we use in table ^ 

From the analysis of the two-point correlation function, 
we note that there may be considerable variation between 
quadrants and hence the sample we have used might not 
be homogenous. A simple test to show that the full sam- 
ple is homogenous is to show that different quadrants, 
being disjoint subsamples of the full sample, are identi- 
cally distributed. To compare qua drants, we use a ran- 
dom permutation test (Dwass 1957) with 100, 000 permu- 
tations to compare the equivalence of N and ((AA) 2 ) 
across quadrants. This gives a necessary condition for 



the distribution of two samples to be equivalent. 

To compare N, we define the observed test statis- 
tic Tpf(obs) — \Na — Nb\ as the difference in N be- 
tween two given quadrants A and B, the test statistic 
Tn = \N AP—N bp\ as the difference in N for a given per- 
mutation p, and the null hypothesis that both quadrants 
have the same N. Each permutation is constructed by 
shuffling cells at random across quadrants. The p-value 
is given by the frequency of sampled permutations where 
the T/v > Tjv(o6s). We can define a similar test statistic 
and procedure T A = \({AN) 2 ) A p- ((AN) 2 ) B p\ for the 
variance. 

We find that at the 95% level, in the 3D la(r) 6.0ft" 1 
Mpc sample, ql and q4 have the same mean and vari- 
ance. In the 2b(r) 6.0ft, -1 Mpc sample, the ql-q2, ql-q3 
and q3-q4 pairs have the same mean and variance, and 
in the 2b(r) 12. Oft -1 Mpc sample, the ql-q2 and q2-q3 
pairs have the same mean and variance. In all other 
samples, we find that we can reject the null hypothe- 
sis that N and ((AA) 2 ) are equal in the samples at the 
95% level. Here we note that the ql-q4 and q2-q4 pairs 
in the 2b(r) 6.0ft _1 Mpc sample and the q2-q3 pair in 



the 2b(r) 12.0ft -1 Mpc sample was not found to have 
the same mean and variance, suggesting that although 
there are quadrant pairs that seem to be similar, there 
is still enough variation between 3 quadrants such that 
not every pair of quadrants is identically distributed. 

The result that quadrants are generally not identically 
distributed suggests that the jackknife errors underes- 
timate the true range of variability in the data. For 
this reason, a better and simpler estimate of variability 
would be to compare /V(A) across quadrants. Hence we 
use the quadrant to quadrant variations in fy(N) as a 
measure of the variability of the counts-in-cells statistic. 
We plot the observed counts-in-cells distribution fy(N), 
the minimum and maximum values of fv(N) from each 
quadrant as a shaded band, and the GQED and NBD 
with the same mean and variance as the observed fv (N) 
in figures [3j [4] and [5j We also plot the jackknife errors 
as errorbars to illustrate the difference between the jack- 
knife errors and the quadrant to quadrant variations. 

We also repeat the analysis for different quadrant 
boundaries shifted by 30°, 45° and 60° in galactic longi- 
tude. In all cases, we see a similar amount of variation 
which suggests that in general, the amount of variation 
between quadrants is not coincidental with the choice 
of quadrant boundaries. To illustrate, we plot the vari- 
ation between quadrants for different quadrant bound- 
aries in figure [6] We note from figure [6] that although 
the variation between quadrants is different for different 
quadrant boundaries, the amount of variation is approx- 
imately the same. This means that these variations are 
probably caused by fluctuations in the number density 
of galaxies on scales that are about as large as the quad- 
rants themselves, suggesting a significant amount of cos- 
mic variance. 

4.2.1. Comparison with models 

To study the parameters b and g and co mpa re the ob- 
served fv{N) to models, we use equations (11) and (13) 
to obtain b and g in a self-consistent manner from the 
mean ./V and variance ((AA) 2 ) such that the theoretical 
distributions have the same mean and variance as the 
observed fv(N). Since the populations of nearby cells 
are often strongly correlated, the cells are in general not 
independent and a \ 2 fit cannot be used. Instead, wc 
compute the least squares distance 



N„ 



E (fvWobs- fv(N)f 



(22) 



N=0 



between the observed distribution and the theoretical 
distribution as a qualitative measure of goodness of fit 
where N max is the largest number of galaxies in a cell. 
We use this goodness-of-fit measure to determine which 
model is closer to the observed data. 

We summarize our results in tables [5] andjgl As ex- 
pected, our results show large differences in A, b and g 
between quadrants. Comparing the observed counts-in- 
cells distribution between the GQED and NBD, based 
on the least squares distance alone, /V(A) for 2D pro- 
jected cells tend to follow the GQED while f v (N) for 3D 
spherical cells tend to follow the NBD. However, in most 
cases, the GQED and NBD both fall within the mea- 
sured quadrant to quadrant variations, and we note that 
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Figure 3. r-band counts-in-cells of different samples. Top left: 1.00° cells, la(r) sample; Middle left: 1.00° cells, lb(r) sample; Bottom 
left: 1.00° cells, 2b(r) sample. Top right: 6.0/i~ 1 Mpc cells, la(r) sample; Middle right: 6.0/i _1 Mpc cells, lb(r) sample; Bottom right: 
6.0h~ 1 Mpc cells, 2b(r) sample. The shaded band represents the extent of quadrant to quadrant variations and the errorbars represent the 
jackknife errors. 
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Figure 4. r-band counts-in-cells of different samples. Top left: 2.00° cells, la(r) sample; Middle left: 2.00° cells, lb(r) sample; Bottom 
left: 2.00° cells, 2b(r) sample. Top right: 12.0/i _1 Mpc cells, la(r) sample; Middle right: 12.0/i _1 Mpc cells, lb(r) sample; Bottom right: 
12.0h~ 1 Mpc cells, 2b(r) sample. The shaded band represents the extent of quadrant to quadrant variations and the errorbars represent 
the jackknife errors. 
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Figure 5. r-band counts-in-cells of different samples. Top left: 4.00° cells, la(r) sample; Middle left: 4.00° cells, lb(r) sample; Bottom 
left: 4.00° cells, 2b(r) sample. Top right: 24.0/i -1 Mpc cells, la(r) sample; Middle right: 24. Oft -1 Mpc cells, lb(r) sample; Bottom right: 
2i.0h~ 1 Mpc cells, 2b(r) sample. The shaded band represents the extent of quadrant to quadrant variations and the errorbars represent 
the jackknife errors. 
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Figure 6. r-band counts-in-cells for 2A.0h~ 1 Mpc cells with different quadrant boundaries. Top left: 0° shift; Top right: 45° shift; 
Bottom left: 30° shift; Bottom right: 60° shift. The shaded band represents the extent of quadrant to quadrant variations using the 
selected quadrant boundaries. For clarity, errorbars are not plotted here. 
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Table 5 

r-band 2D Counts- in-cells fv(N) 



Sample Quadrant Cells N b g x X 10 5 x X 10 5 

(GQED) (NBD) 



e = i.oo° 



lafrl 




All 
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the observed fv(N) for individual quadrants within a 
sample may be closer to the GQED or NBD. This shows 
that the SDSS catalog is unable to distinguish between 
the GQED and NBD because the difference between the 
GQED and NBD is smaller than the quadrant to quad- 
rant variations. Since the NBD has been shown to be 



unphysical (Saslaw & Fang 1996), on physical grounds 
we use the GQED for further analysis. 

4.2.2. Comparison between redshift ranges 

Comparing the values of N and b, between redshift 
ranges, we note that the values of N and correspondingly 



b are lower in the low redshift range than in the high 
redshift range for the same magnitude cutoff (samples 
lb and 2b). This is despite the presence of the SDSS 
"great wall" in the low redshift range. 

On a closer look, we note that the "great wall" spans 
the region between 0.065 < z < 09, 140° < a < 210° 
and -3° < 8 < 6° jDeng et al.||2006[ ) which is a small 
fraction of the SDSs footprint. The observed quadrant 
to quadrant variations of fv(N) do not seem to depart 
much from the GQED, so the presence of a large su- 
percluster probably does not affect the observed form of 
fv(N). This result is in agreement with an earlier anal- 
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Table 6 

r-band 3D Counts- in-cells fv(N) 



Sample Quadrant Cells N b g x X 10 5 x X 10 5 

(GQED) (NBD) 
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All 
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ysis of the 2MASS catalog by |Sivakoff fc Saslaw| ( 2005 ) 
where it was found that the inclusion of the Shapley su- 
percluster did not make a large difference on the agree- 
ment with the GQED. Therefore these large superclus- 
ters may be natural consequences of gravitational inter- 
actions among galaxies. 

Since the "great wall" probably does not make a large 
contribution to the counts-in-cells statistics, the differ- 
ence in N and b between the high and low redshift ranges 
could be due to evolution or selection. Further pursuit 
of these possibilities requires more detailed models. Here 



we note that the difference is consistent across quadrants 
which rules out cosmic variance as a dominant cause. We 
also note that the difference in lookback time between the 
middle of the low and high redshift ranges is about 0.7 
Gyr, which is close to the a mount of time a m erging pair 
of galaxies takes to merge (iConselicel 120091) so we can 
expect that the effect of galaxy mergers might make a 
difference between the low and high redshift range. 

4.2.3. Comparison between magnitude cufoffs 

As expected from different magnitude cutoffs, the val- 
ues of N and b are lower for the brighter lb sample than 
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Table 7 

Examples of observed and expected values of b for bright 
randomly selected subsamples 
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Figure 7. The value of b as a function of cell radius for bright and 
faint galaxies. The bottom plot is the difference between the ob- 
served and expected value of b for randomly selected bright galax- 
ies. For a given size cell, the variations between quadrants range 
over 2.5% for the complete sample and 10% for the brighter sample. 

the fainter la sample. We can compare the value of b 
for the brighter sample with the expected value of b if 
the brighter sample is a rando mly selected sample fr om 
a fainter parent sample using (Lahav & Saslaw 1992): 



(1 



^sample ) 



(i 



^parent J 



l-i- 



jV sampk 
iV^arent 



(2 ^parent) ^p 



^parent 

This allows us to compare the clustering of the brighter 
la sample with the fainter lb sample. We find that for all 
samples, the brighter galaxies from the lb samples have 
a higher value of b than would be expected if they were 
a random subsample of the la sample. This means that 
bright galaxies are more strongly clustered than fainter 
galaxies which may be a result of brighter galaxies being 
concentrated around cluster centers, or in dark matter 
haloes. If so, it suggests an upper limit of about 10/i -1 
Mpc for the size of these haloes. We summarize the com- 
parisons in table [7] and plot the comparison of b for 3D 
cells over a range of cell sizes in figure [7j 

5. DISCUSSION AND CONCLUSION 

In this paper we have compared the counts-in-cells dis- 
tribution of galaxies fv(N) with two theoretical models 



and found that the observed distribution has large field 
to field variations which may be as great as 20% across 
quadrants. These large variations essentially mean that 
there is a considerable amount of cosmic variance in the 
data, and that the galaxies in different quadrants are 
not identically distributed. This also means that errors 
determined using the jackknife procedure will underesti- 
mate the true range of variation in the data because dif- 
ferent subsets of the data are not identically distributed. 
We see the existence of these subregions of different local 
density from the bumps in the counts-in-cells distribution 
for large cells . We note that as shown by Coleman fc| 



Saslaw (1990), these bumps may be the result of regions 
of different local de nsity. 



As suggested by Sylos Labini et al. (2009), a larger 
survey volume will show less effects of cosmic variance, 
and an earlier an alysis of the 2MASS catalog by |Sivakoff| 



& Saslaw ( |2005[ ) provides a h int that this may be the " 
case. In the~2MASS analysis, iSivakoff & Saslawl (12005 1 



found less variation between quadrants, on the order of 
5% instead of the 20% we have found in the SDSS. We 
note that after excluding regions close to the galactic 
plane using |/| < 20°, the 2MASS catalog covers two- 
thirds of the sky, a coverage that is more than three 
times that of the SDSS. 

Although the SDSS "great wall" is contained within 
the low redshift range, it covers only a small fraction of 
the SDSS footprint in a stripe within 6° of the celes- 
tial equator. For this reason we do not see much differ- 
ence between the low redshift range and the high redshift 
range beyond differences in TV because the overdensity 
in the "great wall" is not large enough to dominate over 
the rest of the survey. Indeed, because the "great wall" 
is most likely porous in three dimensions, it is not clear 
that "great wall" is an accurate description. 

Our comparison of 7 for 2D projected cells and 3D 
spherical cells in redshift space shows that there is a dif- 
ference in the exponent 7 of the two-point correlation 
function between the 2D and 3D sample. The 3D sam- 
ples have a value of 7 that is lower than that for the 2D 
samples because the 3D samples in redshift space are af- 
fected by peculiar velocity distortions which change the 
apparent clustering in redshift space. Using the well- 
known relation between the projected correlation func- 
tion and the real space correlation function we find that 
the difference in 7 between the 2D and 3D samples is a 
measure of the redshift space distortions. Our findings 
for the value of 7 suggest that the difference between 
the projected sample and redshift spa ce sample is about 
0.2 ~ 0.3 i n agreement wi t h wor k by |Fry fc Gaztanaga| 



(19941 and Hawkins et al. (2003) using earlier catalogs. 

Comparing the low redshift and high redshift range, 
we find that there is a large difference in N between the 
low and high redshift range for samples with the same 
magnitude cutoff which may be caused by galaxy merg- 
ers. We also find that brighter galaxies are more strongly 
clustered than a random subset of all galaxies in the low 
redshift range, which may be a result of bright galaxies 
clustering around cluster centers or dark matter haloes. 

The analysis of the counts-in-cells distribution shows 
that the observed f v (N) may follow the GQED or NBD, 
and 2D projected cells generally prefer the GQED while 
3D spherical cells often prefer the NBD. However, both 
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distributions are generally within the range of quadrant 
to quadrant variations so we can conclude that both the 
GQED or NBD are in agreement with observations. 

Since the GQED is physically motivated while the 
NBD was found to be unphysical, we can reject the NBD 
as a physically complete description of galaxy cl ustering. 
This conclusion is not at odds with the results of Conroy 

Conroy 



et al. 



et al. 
used 



(2005 ) because although the error bars in 
(2005) excludes the GQED, jackknife errors were 
which, as our results suggest, underestimate the 
true range of variability in the data. 

We conclude that while the SDSS is an excellent sam- 
ple, there remains a considerable amount of cosmic vari- 
ance that will probably require an all-sky survey to re- 
solve. Nevertheless, the counts-in-cells analysis of the 
SDSS data has shown that observations of fv(N) agree 
with the GQED within cosmic variance. 
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